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Abstract 

We establish a link between Fourier optics and a recent construction 
from the machine learning community termed the kernel mean map. Us- 
ing the Fraunhofer approximation, it identifies the kernel with the squared 
Fourier transform of the aperture. This allows us to use results about the 
invertibility of the kernel mean map to provide a statement about the in- 
vertibility of Fraunhofer diffraction, showing that imaging processes with 
arbitrarily small apertures can in principle be invertible, i.e., do not lose 
information, provided the objects to be imaged satisfy a generic condi- 
tion. A real world experiment shows that we can super-resolve beyond 
the Rayleigh limit. 

1 Introduction 

Imaging devices such as telescopes and microscopes collect incoming light using 
lenses or mirrors of finite size. This finite size imposes a finite aperture on 
the light that reaches the optical system, leading to effects of diffraction. In 
particular, diffraction ensures that the image of a point can never be a point. 
For instance, an imaging system using a lens with an _F- number f / D (where / is 
the focal length, and D is the diameter of the circular aperture) has an impulse 
response function (Airy disk) whose radius is 1.22A//I? on the sensor, where A 
is the wave length of the light (for simplicity, assumed to be monochromatic) . 

Another way to express the same insight uses the transfer function. For a 
lens focused at infinity, the transfer function is constant within a circle of radius 

V = 1/(2A//Z?), and zero outside [23, p. 136]. This means, in a nutshell, that 
if we try to image a sinusoidal pattern with spatial frequency larger than z/, 
diffraction will annihilate that pattern. Likewise, if we decompose a general 
object into spatial frequencies by Fourier analysis, all components larger than 

V will vanish. 

*This article has been accepted for publication at the IEEE Conference on Computer Vision 
and Pattern Recognition (CVPR), Portland, 2013. 



Similar considerations hold true if, say, an object is scanned by a focused 
laser beam. Object details smaller than the diffraction limit are washed out, 
and this fundamental limit of image-formation systems is often referred to as the 
diffraction limit [2 '>, p. 136]. There are ways to circumvent it using sophisticated 
hardware, for instance with scanning near-field optical microscopy, or stimulated 
emission depletion microscopy (STED) using fluorescence [ ], but these are not 
the topic of the current paper. Instead, we want to assay whether restrictions on 
the object being imaged can fundamentally change the resolution of an optical 
system. Specifically, we will show that under the generic assumption of bounded 
support, one can in principle (i.e., given a perfect measurement of the image) 
resolve arbitrarily fine detail. This is done by pointing out a connection to the 
field of kernel methods in machine learning, and utilizing certain theoretical 
results from that domain. We do not claim that all our insights are new — 
indeed, we will point out that in spite of the above received wisdom, there are 
certain theoretical results in the optics community, some of them rather old, 
that draw similar conclusions. We do believe, however, that the link to kernel 
methods is new, and hope that it will lead to a fruitful cross-fertilization of two 
previously unconnected branches of research. Using toy examples, we show that 
the assumption of bounded support can be used to recover image detail past 
the diffraction limit for simple real- world images, which are pixelized and not 
noise- free. 

The paper is structured as follows. In Section 2, we explain the notion of 
kernel means. These are particular types of mappings into reproducing kernel 
Hilbcrt spaces, and in some cases they can be shown to be invertible. The kernel 
map has applications in a number of tasks including testing of homogeneity and 
independence [I I, l-]. However, our main interest is a link to wave optics, 
to be described in the next section.^ In Section 3, we explain some basics of 
Fourier optics, in particular the Fraunhofer approximation of diffraction. We 
show that Fraunhofer diffraction is actually a particular case of kernel mean 
mapping. This link between Fourier optics and machine learning allows us to 
leverage some theoretical results about kernel mean maps to make a surprising 
statement about super-resolved imaging. Section 5 discusses how this result 
relates to certain observations made by the wave optics community. 

2 Characteristic kernel means 

A symmetric function k : — > M, where A" is a nonempty set, is called a pos- 
itive definite (pd) kernel if for arbitrary points xi, . . . ,x„i G X and coefficients 
fli, . . . , am & K, we have 

''^^aiajk{xi,Xj) > 0. 

The kernel is called strictly positive definite if moreover for pairwise distinct 
points equality with zero, ^ aiajk{xi,Xj) = 0, implies that all coefRcients 
vanish, = for all i. 

Any positive definite kernel induces a mapping 

X i-> .) (1) 

^This link was pointed out during a mathematical workshop in Oberwolfach, see [25]. 
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into a reproducing kernel Hilbert space (RKHS), which is a Hilbert space of 
functions / : A" — >■ M with an inner product (., .) such that k represents point 
evaluation, 



(2) 



which implies also the reproducing property {k{x , .) , k{x' , .)) = k{x,x'), see 
e.g. [24] for more details. 

2.1 Kernel mean of a sample 

In an SVM [24], (1) is the mapping that takes each datapoint into the so- 
called feature space, in which a linear learning method is applied. Rather than 
mapping the points one by one, however, one can also map a sample or a 
distribution directly to its mean in the feature space. Below, we will show that 
this kind of mapping contains optical imaging as a special case. But before, 
we first point out that even though the operation of taking the mean usually 
comes with a loss of information, this need not be the case if the kernel satisfies 
a certain condition. 

Consider a sample of points X ~ {xi, . . . ,Xm.} C X, that are distinct, i.e., 
Xi 7^ Xj whenever i ^ j. Given a pd kernel fc, we define the kernel mean map 
of X by [24, 2>S] 



Consider another sample of distinct points Y = {yi, . . . ,yn} C X. Clearly, 
if X equals Y, their kernel means are identical. What about the converse? 

We call a kernel characteristic for samples, if the mean map /x based on k 
is injective, i.e., if identical kernel means fJ-{X) — ^i{Y) imply identical samples 



It is not obvious whether characteristic kernels exist. E.g. for polynomial 
kernels k{x,x') — {{x,x') + l)"*, with d E N, observing equal kernel means 
fi{X) = fi{Y) for the samples X and Y implies that all empirical moments 
up to order d of X and Y coincide. However, X and Y might differ in their 
empirical moments of higher orders. The following proposition gives a sufficient 
condition for being a characteristic kernel: 

Proposition 1 Strictly pd kernels are characteristic for samples. 

Proof: Consider a strictly pd kernel k and its mean map /i. Consider two 
samples X — {xi, . . . , Xm} C X and Y = {yi, . . . , ?/„} C A" as above with equal 
kernel means, niX) = IJ.{Y). Let Z = {zi, . . . , z;} be the set (not the multiset) 
of all elements in the union of X and Y, i.e. all elements in Z are pairwise 
distinct. Let #X(z) be the number of times z appears in X, similarly ^Y(z). 
Define = ^X{zi)/m — ^Y{zi)/n. Then we have 




(3) 



X = Y. 



= ^l{x) - ^i{Y) 



(4) 




(5) 



Now take the dot product between (5) and itself, leading to 

/ I 

0= (^7,/c(zi,.),^7jfc(zj,.)), (6) 

which by the reproducing property and bilinearity amounts to 

/ 

= XI T*7ifc(^«,2j)- (7) 

Since k is strictly pd, this implies that for all i the coefficients 7; are zero, 
thus #X(zi) = #Y{zi)m/n. Since i^X (zi) , i^Y (zi) e {0,1}, we conclude that 
m = n and #X{zi) = #Y{zi) for all i, i.e., X = Y. 

■ 

The mean map has some other interesting properties [28]. Among them is 
the fact that fi{X) represents the operation of taking a mean of a function on 
the sample X: 

/ m \ m 

\ m ^ — ' / m ^ — ' 

\ i=l I i=l 

where we have applied the point evaluation property. 



2.2 Kernel mean of a probability measure 

Instead of samples we next consider probability measures'^ defined on X assum- 
ing that X has the necessary additional structure. To ensure that the following 
integrals exists, we assume that all considered kernels are bounded (see [-'■>])■ 
Below, we will think of the measures as the light distribution of the object be- 
ing imaged. We extend the mean map to probability measures by defining the 
kernel mean of P as 

^^{P) = J k[x,.)dP{x). (9) 

Similar to the above definition, we call a kernel characteristic for probability 
measures [ ] if the mean map is injective for probability measures, i.e., n{P) = 
fJ.{Q) implies that P and Q are equal. 

To state the analog of Proposition 1, we define a kernel k to be integrally 
strictly positive definite if for any finite non-zero signed Borel measure the 
integral of k wrt. u is strictly positive, 

J k{x,x') dv{x) dv{x') > 0. (10) 

Note that an integrally strictly pd kernel is also strictly pd but not vice versa. 

Proposition 2 Integrally strictly pd kernels are characteristic for probability 
measures. 

^We assume that all measures considered are Borel measures. 
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This result was proven by [29]; we only provide a brief proof sketch: Consider 
two different probability measures P and Q. Their difference is a finite non-zero 
signed Borel measure u = P — Q. Assuming equal kernel means, we have: 

^ ^i{p) ^ ^i{Q) (11) 

= / k{x, .) dP{x) - J k{x, .) dQ{x) (12) 
k{x,.)diy{x) (13) 



Taking the squared norm and using the reproducing property we get a contra- 
diction, 



= ( / fc(x, .) diy{x), J k{x, .) dv{x)) (14) 
k{x,x') dv{x) dv{x') > Q (15) 



where we used for the last inequality the fact that k is integrally strictly pd. | 
A more specific view on characteristic kernels, which will apply in the case 
of Fraunhofer imaging, can be obtained by considering translation invariant pd 
kernels on A" = M'*, i.e., kernels that can be written as k{x, x') = ip{x — x') with 
some continuous function : M"^ — > M. By Bochner's theorem [ ], they can be 
expressed as the Fourier transform of a finite non-negative Borel measure A, 

^(x) = J e^'^""" dA{uj). (16) 

Following Corollary 4 in [' " '] we can write the squared RKHS distance between 
the kernel means of two probability measures in terms of their characteristic 
functions, 

MP) - KQW = J \M^) - 0q(^)I' dAiu;) (17) 

where ||.|| is the norm of the RKHS and 4>p{uj) = J e*^ " dP{x) is the character- 
istic function of P, and likewise (/)q. Roughly speaking, this shows that P and Q 
can be distinguished as long as the spectrum A of the kernel is nonzero wherever 
the spectra of the probability distributions might differ. If A has full support, 
i.e. it is non-zero almost everywhere, the corresponding kernel can distinguish 
all probability distributions. If it does not have full support, it can sometimes 
still distinguish a restricted class of probability distribution as we see next. 



2.3 Kernel mean of a probability measure with bounded 
support 

Consider a translation invariant pd kernel k such that the support of the cor- 
responding A has a non-empty interior. For what class of probability measures 
can such a kernel be characteristic^? An obvious choice is a class of proba- 
bility measures whose characteristic functions agree outside the support of A. 

^We use characteristic for a class of probability measures in the obvious way, i.e. the kernel 
map is injective for the restricted class. 
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However, there is a much more interesting class of measures which we define 
next. 

Let us consider a probabihty measure P with compact support. By the 
Paley- Wiener theorem [2f] its characteristic function (pp is entire (aka analytic 
or holomorphic) , which implies that knowing 0p on a compact subset determines 
0p everywhere. This leads to the following proposition: 

Proposition 3 Translation invariant pd kernels, whose corresponding A have 
a support with non-empty interior, are characteristic for probability measures 
with compact support. 

This is a simplification of Theorem 12 in [ ] which also contains a detailed 
proof. 

The kernel which will be relevant in the next section is the sine kernel defined 
for (7 > as 

,/ ,^ 11 M sincr(x-a;') 

k[x, X ) — %)\X — x) — ; . (18) 

X — x' 

The Fourier transform of ■(/; is the scaled indicator function of the interval [— cr, cr], 
i.e. 

AH = ^1[-...]H, (19) 

so A is non-zero on that interval (thus having a support with non-empty interior) 
and is thus characteristic for probability measures of bounded support. The 
square of the sine kernel has the same properties, since it corresponds to the 
convolution of A with itself, inheriting a support with non-empty interior from 
A. 



3 Incoherent imaging as a mean map 

3.1 Imaging under incoherent illumination 

As electromagnetic radiation, light is governed by Maxwells equations - a set of 
linear partial differential equations that form the foundation of classical electro- 
dynamics including classical optics. Although electric and magnetic fields are 
vectorial in nature, in many situations"' polarisation effects, i.e. any coupling 
between the electric and magnetic fields, can be neglected and all components 
of the electric and magnetic field can be well described by a single scalar wave 
equation [15] 

(V'-^£)'f(^'*)=0, (20) 

where <i>(M,t) is any of the scalar field components of the electric or magnetic 
field and ng denotes the refractive index of the medium, within which the light 
is propagating. Since (20) is a linear partial differential equation, any linear 

*More precisely, the scalar theory of electromagnetism is valid in linear, isotropic, homoge- 
neous and non-dispersive dielectric media such as free space or a lens with constant refractive 
index, where all components of the electric and magnetic field behave identically 
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combination of its solutions yields another solution. The property of linearity 
has major implications for the mathematical treatment as it allows us to analyse 
a system by studying its response to a single point stimulus. Its effect to a com- 
plex input signal ^{£,jt) can be obtained by considering the input signal being 
composed of point stimuli and adding up their known responses accordingly: 

^{u,t)^ J hiu-0 <f(C,i)rfe (21) 

Here ^ denotes the output of a linear optical system which is fully described 
by its impulse response h{u — £_). For ease of exposition we implicitly assume 
stationarity both in space (i.e. h{u;^) = h{u — S,)) and time (i.e. h depends not 
on t) in (21). 

Optical detectors such as CCD sensors usually record intensities, i.e. the 
square of the field amplitude. Since the integration time is much longer than a 
single period of oscillation, we must average over time to obtain the recorded 
pixel intensities 

(*(m, t)^{u, t)) = J J h{u - ^) h{u - X (22) 

m^,t)^e,t)) d^de, (23) 

where (.) denotes temporal averaging. Here, we must take the coherence prop- 
erties of the light into account and distinguish between coherent and incoherent 
illumination: 

• In the case of coherent illumination, we cannot simplify Equation (23) 
any further without making any additional assumptions. The square of 
the complex field can lead to cancellations or other non-linear interference 
effects. 

• In the case of incoherent illumination, the spatial correlation between any 
two light rays emitted from the scene is assumed to be negligible. Hence, 
the time average in (23) will only contribute to the integral for ^ = 

{'f{tt)He,t)) = \m\' s{^-a (24) 

Plugging expression (24) into Equation (23) yields the incoherent imaging equa- 
tion 

q{u) = j f{u^ PiO dt (25) 

where we introduced q{u), p{^) and /(u - C) for (|'I>(u, i)|2), (|^'(^,t)|2) and 
\h{u — ^)p, respectively. Both p{£_) and q{u) describe image intensities; the 
impulse response / is called the point spread function (PSF) of the imaging 
system as it corresponds to the image of a point light source. 

Although we had to make a number of assumptions to derive the incoherent 
imaging equation (25), it has been found to provide an accurate description 
for most typical imaging systems including astronomical, microscopical imaging 
and photography [2]. 
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3.2 Connection to kernel mean map 



As an image is inherently non-negative, the image of the object p{^) induces, up 
to normahzation, a probabihty measure P. In addition we assume finite energy, 
i.e., / piO'^^ < oo- Then Eq. (25) can be understood such that such that for 
the translation-invariant kernel function k{u, ^) = f{u — ^), the resulting image 
q is the kernel mean of P: 



So we obtained the interesting result that the incoherent imaging equation can 
be expressed as a kernel mean.'' 

3.3 Fraunhofer diffraction 

The resolution of any optical system even without optical aberrations is limited 
by diffraction. The mathematical framework describing diffraction is Fourier 
optics [23, e.g.]. It decomposes the light radiated by an object into harmonic 
components of different spatial frequencies, each one corresponding to a plane 
wave whose amplitude is given by the Fourier transform of the emitted light 
field. It turns out that at a far distance from the object, most of these waves 
cancel each other, and each direction in space only 'sees' one of the plane waves 
— the free-space wave propagation can be identified with the Fourier transform, 
different spatial frequencies in the object corresponding to one direction each. 
This is referred to as the Fraunhofer approximation. By means of a lens, this 
situation can be realised also for a finite distance, and different directions in 
space correspond to different coordinates on the image plane, or camera sensor. 

In an ideal, aberration-free optical system, the Fraunhofer approximation 
states that the PSF is the inverse Fourier transform of the auto-correlation 
function of the pupil or aperture function [10]. In the following we compute the 
PSF for the simple case of a circular planar aperture. 

3.4 Diffraction in one dimension 

In one dimension, consider an aperture a : K — >■ K defined as a(w) — g.] (w). 
The inverse Fourier transform of a is the sine function sin(a;a;)/a;. Then by 
the Wiener-Khinchin theorem the PSF / as the auto-correlation function of the 
aperture function, i.e. a, is the square of the sine function. 



3.5 Diffraction in two dimensions 

Also for more than one dimension the incoherent imaging equation is expressible 
as a kernel mean. For this we consider a two dimensional circular aperture with 

^This provides a physical interpretation of the kernel as the point response of an optical 
system. This kind of interpretation can be beneficial also for other systems, and indeed it 
is suggested by the view of kernels as Green's functions [16, 21]: the kernel k can be viewed 
as the Green's function of P*P, where P is a regularization operator such that the RKHS 
norm can be written as ||/||fc = ||P/||- For instance, the Gaussian kernel corresponds to a 
regularization operator which computes an infinite series of derivatives of /. 



(26) 




(27) 
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radius ct, where the aperture function is the pill box function: 



r 1 if ||a;|| < cr 
[ otherwise 



(28) 



Again, the PSF is the Fourier transform of the auto-correlation function, which 
in this case is the squared Bessel function of the first kind of order one. 



Note that any translation-invariant kernel k constructed from a positive aper- 
ture function is pd due to Bochner's theorem, so the corresponding diffraction 
can be written as a kernel mean as in Eq. (26). Note that in addition to the 
two apertures discussed so far, we could use arbitrary apertures satisfying the 
condition of Proposition 3, including apertures that are not indicator functions 
(if physically realizable): Bochner's theorem ensures that for all nonnegative 
measures, the Fourier transform is a pd kernel, and Proposition 3 ensures that 
the kernels are characteristic. 

3.6 Breaking the diffraction limit 

The actual resolution that is possible with a given optical system is determined 
by the size of the aperture, which could be the size of the mirror or lens in a 
telescope. 

Having written the incoherent imaging equation as kernel means, we can 
apply the insight from the previous section to obtain the surprising result that 
an object p{^) with bounded support, i.e. p(^) is zero outside some compact 
area, the Fraunhofer diffraction does not destroy any information, i.e. at least 
theoretically, the diffraction limit is no limit: 

Proposition 4 An object with bounded support can be recovered completely 
from its diffraction-limited image. 

Proof: This follows from the injcctivity of /i in the context of Proposition 3 and 
the fact that any aperture shape induces a translation-invariant pd kernel by 
Bochner's theorem. 

Note that this proposition only states that the kernel mean map is invcrtible 
— it does not make a statement about the practical problem of how to compute 
the inverse. In the next section we present a simple approach to do so. 

4 Experiments 

Fig. 1 illustrates a typical experimental setup: two point sources (in green and 
blue on the left) are imaged through an optical system consisting here of a single 
lens (with focal length /) and a finite aperture of diameter D. Under incoherent 
illumination the observed image on the right is a superposition of the images of 
the point sources, each of which is given by the impulse response of the optical 
system ^. In an ideal diffraction- limited optical system, two point sources can 
only be resolved if they are at least 1.22A//Z? apart. To demonstrate that we 
can resolve beyond this so-called Rayleigh limit, we place the two point sources 
so close, that their individual images cannot be resolved (i.e. the red dashed line 
in Fig. 1 has only one maximum). 




(29) 
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Figure 1: A one dimensional double star (two delta peaks on the left) gets 
imaged by the lens with the finite aperture leading to an blurred image formed 
by the sum of two squared sine functions on the right. 

4.1 Recovering a one-dimensional simulated image 

The recorded image is usually corrupted by measurement noise, sometimes mod- 
eled as additive Gaussian. Then Eq. (26) becomes q{.) = niP) + n where 
n oc N{0,a). The first row of Fig. 2 shows the true object (green) and the 
observed image (gray) of a one dimensional toy example for increasing amounts 
of noise (from left to right). More precisely, we represent the true object p and 
the recorded image q as finite-length one-dimensional column vectors u and v. 
According to the Fraunhofer diffraction equation, the relationship between the 
object u and image v is linear and can be expressed as a matrix: 

V = F^TFZu + n. (30) 

Here, Z is a zero-padding matrix, F is the discrete Fourier transform matrix, 
F^ the hermitian matrix of F (i.e. the inverse transform), and T is the opti- 
cal transfer function (OTF), i.e. the Fourier transform of the system's impulse 
response, i.e. T = Fijj, with ip being a finite dimensional vector, too. 

The object u can be recovered from v hy a, maximum likelihood approach, 
i.e. we solve the following least-squares problem 

mmu\\v ~ F^TFZuW^. (31) 

The middle row of Fig. 2 shows the recovered objects u of the noisy observations 
V (first row in gray) using the Matlab command 

u = (F'*T*F*Z) \ v; 

As suggested by our findings in Section 3, the true signal can be recovered 
exactly in the noise-free case (first column). The assumption of bounded support 
is implicit by chosing u to be shorter than v. However, already small amounts 
of noise render the optimisation problem in Eq. (31) ill-conditioned yielding an 
unstable solution. 
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Figure 2: Restoring a difFraction-liraited iraage (gray, first row) of one- 
dimensional double star (green, first row) with increasing amounts of noise (from 
left to right). The maximum likelihood solution (blue, second row) restores the 
double stars only in the noise- free cases (left column) . The non-negatively con- 
strained maximum likelihood approach (blue, third row) restores the double star 
even for various amounts of noise (third row, left to right) . 
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As an image accounts for the amount of recorded photons we can employ 
non- negativity as an additional physical constraint. Hence, instead of Eq. (31) 
we solve the constrained optimization problem 

min„||u - F^TFZuW^ s.t. it > 0. (32) 

The non-negativity constraint stabilizes the restoration process and yields good 
results even for large amounts of noise (bottom row in Fig. 2). We solve the 
non-negative least squares problem using the Matlab command: 

u = lsqnonneg(F' *T*F*Z, v) ; 

4.2 Recovering a two-dimensional real image 

We build an experimental setup with an artificial double star (lighted by green 
light) that is imaged by a cooled camera (PCO.2000) in about one meter dis- 
tance. The optics of the camera consists of a changeable aperture and a single 
lens (/ = 100mm). Panel (d) of Fig. 3 shows a "ground truth" image that has 
been taken with an aperture of 4mm and exposure time of 3ms. Panel (a) shows 
the same double star but with aperture 0.5mm. The aperture has been chosen 
that the angular separation of the double star is 50 percent below the Rayleigh 
limit. Note that the two stars are not visible anymore and the light has been 
spread out due to diffraction. To get a good measurement we had to expose 
for 4000ms. Both images, (a) and (d), arc the result of averaging eight images 
minus an averaged dark frame to reduce the noise to a minimum. The support 
is chosen by thresholding the measured image, panel (a). Applying the method 
described in the previous paragraph to the image in panel (a), we are able to 
recover the two double stars which are quite similar to the ground truth (panels 
(c) and (d) in Fig. 3). Note that the ground truth is more blurry since it is also 
photographed with a finite aperture. 

5 Related work 

The question whether it is possible to break the diffraction limit has been the 
subject of numerous works: 

In 1952, Toraldo di Francia [ ] stated that "we notice that the classical limit 
of 1.22A/Z), which has always been accepted as a theoretical limit, proves instead 
to be only a practical limit." Motivated by "super-gain antennas" he studies 
the diffraction patterns of "super-resolving pupils" which consists of concentric 
rings instead of a uniform pupil. He observes that for an increasing number 
of rings the central disc of the airy disc becomes smaller and more isolated, 
hereby increasing the resolution. In [ ], the same author discusses the problem 
of resolving power from the point of view of information theory. He makes the 
point that several objects can lead to the same image, so without an "infinite" 
amount of prior information we cannot do two-point resolution. 

A few years later, Wolter showed in [ ] that bounded illumination (cf. 
our bounded support assumption on the object), is sufficient to recover higher 
frequencies, since the Fourier transform of a bounded object is analytic. He 
uses accelerating summation techniques to analytically continue the spectrum 
that has been cut off by an aperture. Independently of Wolter, Harris [13] 
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(a) V, aperture=0.5inin 



(b) D^TDZu 




(c) u, recovered image (d) ground truth, aperture=4mni 



Figure 3: Real photograph of an artifical double star, that is clearly visible if 
the aperture is open (d), but not for small aperture (a). The recovered image 
(c) shows the two stars without blur, (b) shows the result of passing (c) through 
the forward model. All images show crops (size 60 x 50) of larger images (size 
647 X 570). 
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also considered bounded objects and the fact that their Fourier transforms are 
analytic. He also proposed a method for analytic continuation (for the noise- 
free case). His conclusion is that "diffraction imposes a resolution limit which is 
determined by the noise of the system rather than by some absolute criterion." 

Barnes [1] proposed a reconstruction procedure for coherent illumination. 
He uses the assumption of bounded support to write the convolution operator 
in the imaging equation in such a way that it can be decomposed into prolate 
spheroidal wave functions [27]. This allows inversion of that operator, similar to 
division in Fourier space. Rushforth and Harris [22] study the influence of noise 
on reconstruction methods to overcome the diffraction limit. Their conclusion is 
that "the Rayleigh criterion is an approximate measure of the resolution which 
can be achieved easily." 

Gerchberg [8] (and independently Papoulis [19]) proposed an algorithm anal- 
ogous to Gerchberg and Saxton's phase retrieval method [')] incorporating also 
positivity. As Jones [18] points out, this algorithms converges under certain 
conditions only rather slowly. 

Although the above works have provided insight into theoretical aspects of 
recovering object properties beyond the diffraction limit, the proposed methods 
did not become relevant in practice. In 1993, Sementilli, Hunt and Nadar [26] 
derived bounds on the bandwidth extension in terms of object size and noise 
variance under the assumption of bounded object support and positivity. Sec- 
tion 6.6 of Goodman's book on Fourier Optics [lU] discusses these early studies 
of the diffraction limits and concludes, that "the Rayleigh limit to resolution 
represents a practical limit to the resolution that can be achieved with a con- 
ventional imaging system." 

Several papers consider a bounded support constraint to overcome the diffrac- 
tion limit. Another possible constraint is sparsity: Donoho [()] studied the prob- 
lem of recovering a sparse signal for which only low frequencies of its Fourier 
transform are available. Recently, Candes and Fernandez- Granda [3] also stud- 
ied conditions under which sparse signals can be recovered. The results apply 
to signals which have a sparse representation. Sparsity has effectively also been 
practically used to break the diffraction limit using hardware, e.g. in stimulated 
emission depletion microscopy (STED) [14]. 

Finally, one should mention that the works above consider superresolution 
as the problem of breaking the diffraction limit, as opposed to trying to "only" 
increase the resolution of low resolution sensors (e.g. [IT]). This type of super- 
resolution is not the topic of this paper so we refer the reader to the review of 
Park, Park and Kang [2i,i]. 

6 Conclusion 

We have developed a novel connection between machine learning and Fourier 
optics, identifying a positive definite kernel with the squared Fourier transform 
of an imaging system's aperture. Leveraging results from RKHS theory, this 
led to a condition on an object (boundedness of its support) which ensures that 
it can be fully reconstructed from the image. Simple experiments showed that 
such reconstructions are possible with real data. While we do not claim that 
our approach has immediate practical implications, we believe it is surprising 
and noteworthy that a celebrated results in Fourier optics can be analyzed using 
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the theory of positive definite kernels used in machine learning, with nontrivial 
implications for the profound problem of optical super-resolution. We hope this 
link can be further exploited to gain a beter understanding and possibly novel 
solutions to optical problems. In an experimental setup we show that we are 
able to super-resolve beyond the Rayleigh limit. 
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